AITopics | single-hidden-layer neural network

An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent. While the efficiency of such methods depends crucially on the local curvature of the loss surface, very little is actually known about how this geometry depends on network architecture and hyperparameters. In this work, we extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. We focus on a single-hidden-layer neural network with Gaussian data and weights and provide an exact expression for the spectrum in the limit of infinite width. We find that linear networks suffer worse conditioning than nonlinear networks and that nonlinear networks are generically non-degenerate. We also predict and demonstrate empirically that by adjusting the nonlinearity, the spectrum can be tuned so as to improve the efficiency of first-order optimization methods.

fisher information matrix, name change, single-hidden-layer neural network, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Reviews: The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

Neural Information Processing SystemsOct-7-2024, 05:57:09 GMT

Some further work is dedicated to a conditioning measure of this matrix. It is argued that neural networks with linear activation function suffer worse conditioning than those with non-linearities.

assumption, fisher information matrix, single-hidden-layer neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

Pennington, Jeffrey, Worah, Pratik

Neural Information Processing SystemsFeb-14-2020, 16:45:12 GMT

An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent. While the efficiency of such methods depends crucially on the local curvature of the loss surface, very little is actually known about how this geometry depends on network architecture and hyperparameters. In this work, we extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. We focus on a single-hidden-layer neural network with Gaussian data and weights and provide an exact expression for the spectrum in the limit of infinite width. We find that linear networks suffer worse conditioning than nonlinear networks and that nonlinear networks are generically non-degenerate.

fisher information matrix, single-hidden-layer neural network, spectrum, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

Pennington, Jeffrey, Worah, Pratik

Neural Information Processing SystemsDec-31-2018

An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent. While the efficiency of such methods depends crucially on the local curvature of the loss surface, very little is actually known about how this geometry depends on network architecture and hyperparameters. In this work, we extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. We focus on a single-hidden-layer neural network with Gaussian data and weights and provide an exact expression for the spectrum in the limit of infinite width. We find that linear networks suffer worse conditioning than nonlinear networks and that nonlinear networks are generically non-degenerate. We also predict and demonstrate empirically that by adjusting the nonlinearity, the spectrum can be tuned so as to improve the efficiency of first-order optimization methods.

artificial intelligence, machine learning, matrix, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

Pennington, Jeffrey, Worah, Pratik

Neural Information Processing SystemsDec-31-2018

An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent. While the efficiency of such methods depends crucially on the local curvature of the loss surface, very little is actually known about how this geometry depends on network architecture and hyperparameters. In this work, we extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. We focus on a single-hidden-layer neural network with Gaussian data and weights and provide an exact expression for the spectrum in the limit of infinite width. We find that linear networks suffer worse conditioning than nonlinear networks and that nonlinear networks are generically non-degenerate. We also predict and demonstrate empirically that by adjusting the nonlinearity, the spectrum can be tuned so as to improve the efficiency of first-order optimization methods.

artificial intelligence, machine learning, matrix, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

The Upper Bound on Knots in Neural Networks

Chen, Kevin K.

arXiv.org Machine LearningNov-29-2016

In recent years, neural networks--and deep neural networks in particular--have succeeded exceedingly well in such a great plethora of data-driven problems, so as to herald an entire paradigm shift in the way data science is approached. Many everyday computerized tasks--such as image and optical character recognition, the personalization of Internet search results and advertisements, and even playing games such as chess, backgammon, and Go--have been deeply impacted and vastly improved by the application of neural networks. The applications of neural networks, however, have advanced significantly more rapidly than the theoretical understanding of their successes. Elements of neural network structures--such as the division of vector spaces into convex polytopes, and the application of nonlinear activation functions--afford neural networks a great flexibility to model many classes of functions with spectacular accuracy. The flexibility is embodied in universal approximation theorems (Cybenko 1989; Hornik et al. 1989; Hornik 1991; Sonoda and Murata 2015), which essentially state that neural networks can model any continuous function arbitrarily well. The complexity of neural networks, however, have also made their analytical understanding somewhat elusive. The general thrust of this paper, as well as two companion papers (Chen et al. 2016b,a), is to explore some unsolved elements of neural network theory, and to do so in a way that is independent of specific problems. In the broadest sense, we seek to understand what models neural networks are capable of producing. There exist many variations of neural networks, such as convolutional neural networks, recurrent neural networks, and long short-term memory models, each having their own arenas of success.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

1611.09448

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

single-hidden-layer neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

29d74915e1b323676bfc28f91b3c4802-Paper.pdf

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

Reviews: The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

The Upper Bound on Knots in Neural Networks